perm filename MATH.THY[DIS,DBL] blob
sn#212610 filedate 1976-04-23 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00007 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 Mathematics Research: a model research/textbook orders an example.
C00005 00003 . SSSEC(Model of Math Research)
C00016 00004 ↓_Textbook-Order vs Research-Order_↓
C00024 00005 Let's make this more concrete, by considering how a mathematician might discover
C00036 00006 ↓_Metaphors to other processes_↓
C00044 00007 . SSSEC(AM as Heuristic Search)
C00063 ENDMK
C⊗;
Mathematics Research: a model; research/textbook orders; an example.
Section 1 bares the underlying basis of AM: all the previously-hidden
assumptions about what constitutes "doing math research".
First, let me
"admit" the existence of such a basis of beliefs, an interlocking set of
constraints which are (i) believed by the author, (ii) quietly interwoven
throughout AM, and yet (iii) can't be readily tested by AM.
Much of this foundation deals with what to ⊗4ignore⊗*: incubation, subconscious,
human drives and taboos, political climate, and factors which I'm not even aware
of ignoring.
The rest of the basis is more positive. In this section, a detailed model of
-- perhaps a recipe for -- making math discoveries will be expounded. The reader
will see that AM encorporates many of the details, in all-encompassing ways which
can't be separated from the rest of the system and examined.
For example, there is the postulate that heuristics can be useful for both
suggesting new things to consider, and for keeping that space of plausible items
pruned to a reasonable size. By the very manner that heruistic rules are
coded and used, these two functions are blended together in AM.
$$ A mathematical
theory encompasses (i) a basis of primitive objects and activities, (ii) a
foundation of axiomatic constraints on the basis, (iii) a body of definitions
tying together basic and already-defined concepts, (iv) a body of theorems which
are implied by the foundation and the definitions, (v) some interpretations in
terms of either reality or other theories.$
. SSSEC(Model of Math Research)
In order to intelligently discuss how to automate an activity, we must be very
clear about how humans do that activity. Thus, for AM, we must hypothesize
a particular model of how mathematicians actually go about doing their research.
Thanks to Polya, Kersher, Hadamard, Skemp, many others, and introspection,
I have pieced together a tentative such information processing
model for math theory formation:
.ONCE SELECT 8 TURN OFF "α" TURN ON "∞→"
⊂∞α→⊃
.BN NARROW 2,2 SINGLE SPACE PREFACE 0
1. The order of events in a typical mathematical investigation is:
(a) OBSERVE:
The observation is either of reality or of an analogous, already-established
mathematical theory.
(b) NOTICE REGULARITY: Perceive some patterns, some interesting relationships
that appear to hold sometimes.
Thus math research is an ⊗4empirical⊗* process.
(c) FORMALIZE: Decide on some of the objects, operators,
definitions, and statements that the theory will contain.
(d) FINALIZE: Decide which concepts are to be primitve and which aren't;
decide which statements will be considered axioms, and ensure that the others
can in fact be derived from them.
(e) DEVELOP: What additional theorems can be proven from this formal system;
do they correspond to observable phenomena in the domain which motivated
this new theory? When new observations are made in that motivating domain,
can they be naturally phrased as formal statements in this theory; and if so,
are they provable from the existing axioms and theorems?
2.
Notice that each step in (1) involves choosing from a large set of alternatives
-- that is, searching.
The key to keeping this from becoming a blind, explosive search is the
proper use of evaluation criteria. That is, one must constantly choose the
most interesting, aesthetically pleasing, useful,... alternative available.
This is analogous to Dendral's reliance on good heuristics to constrain the
structure generator.
3.
But many of those criteria are usually opposed to each other
(e.g., one often must sacrifice
elegance for utility, interestingness for safety, etc.). How should one
weight these features when deciding what to do next during an investigation?
We believe (and one goal of AM is to test) that
the non-formal criteria (aesthetics, interestingness, inductive inference (from
empirical evidence), analogy, intuitive clarity, utility)
are much more important than formal
deductive methods in developing mathematically worthwhile theories, and
in avoiding barren diversions.
Among the subjective criteria, the order listed above is roughly their order
of importance. However, AM should have a dynamically variable "orientation",
which under certain circumstances might induce it to seek safety (e.g., utility)
rather than uncertainty (e.g., proposing an analogous proposition).
4. The above criteria are virtually the same in all domains of mathematics,
and at all levels. Each factor encourages some pursuits and discourages others.
It is hoped that no modifications need be made to AM's judgmental scheme,
as AM acquires more and more new concepts.
5. For true understanding, AM should grasp$$ Have access to, relate to, store,
be able to manipulate, be able to answer questions about $
each concept in several ways:
declarative (definition), operational (how to use it), demonic (recognizing
when it is relevant), exemplary (especially boundary examples), and
intuitive (simulated image of a real-world interpretation).
6. Progress in ⊗4any⊗* field of mathematics demands much informal heuristic
knowledge (and some
formal knowledge) of ⊗4many⊗* different "nearby"
mathematical fields. So a broad,
universal core of intuition must be established before any single theory
can meaningfully be developed.
Intuition is contrasted with more formal representations by the fact that it
is opaque (AM cannot introspect to determine how the result is produced) and
fallable.
.WIDEN 2,2
.ONCE SELECT 8 TURN OFF "α" TURN ON "∞→"
%∞α→$
.END
.SKIP 2
This last point (6) merits elaboration. I believe that to develop any given field
of mathematics, one must possess much intuition and heuristics drawn from each
⊗4psychologically⊗* prerequisite field,$$ One which makes the new field
easier to learn, which contains more concrete analogues of the ideas of the new
field. For example, knowing about geometry makes it easier to learn about
topology, even though topology never formally uses any results or definitions
from geometry. $ and some definite facts about each ⊗4formally⊗* preceding
field$$ For example, arithmetic is usually formally based upon set theory,
although most of us learn about them in the other order $ of mathematics.
The diagram here
indicates the prerequisites for each domain which might
conceivably be worked in by AM. To start in a given domain, some knowledge
should exist about all domains which point to the given one,
about all domains which point to ⊗4them⊗*, etc.
⊗4NOTE: This section, and especially the one after it, are
supplementary, and may be skipped on first reading.⊗*
.B0 TURN OFF "α"
Elementary Logic ααα→ Theorem-Proving ααααααααααααααα⊃
↑ ~
~ ~
~ ~
εαααααααα→ Geometry ααα→ Topology ~
~ ~ ↑ ~ ~
~ ~ ~ ~ ~
~ ↓ ~ ↓ ~
~ Analytic Geometry ~ Algebraic Topology ~
~ ↑ ↓ ↑ ~
~ ~ Measure Theory ~ ~
↓ ~ ↑ ~ ~
Boolean Algebra αβαβαα→ Abstract Algebra ~
↑ ~ ~ ~ ↓
.ONCE TURN ON "α"
~ ~ ↓ ~ Program Verification
~ Analysis ↓ ↑
~ ↑ Concrete Algebra ~
~ ~ ↑ ~
~ ~ ~ ~
↓ ~ ~ ~
Set Theory ααα→ Arithmetic ααα→ Number Theory ~
~ ~
~ ~
↓ ~
Combinatorics ←ααα→ Graph Theory αααα$
.E
Each arrow represents either a psychological or a formal prerequisite:
To work in the field pointed ↓_to_↓, one should know something about the field
pointed ↓_from_↓.
Notice that almost all the "elementary"
branches of mathematics are either formally
or psychologically prerequisite
to each other.$$ For example, one should have some intuition about
sets before doing Number theory,
and one should have some grasp of numbers
and counting before doing Set theory. $ So a broad foundation of intuition,
spanning ⊗4↓_several_↓⊗* mathematical and
real-world concepts, is prerequisite to sophisticated behavior in ⊗4any⊗*
branch of mathematics.
This smacks of the idea of a "critical mass" of knowledge,
so often sensationalized in science fiction. In addition to
expecting that the corpus must exceed some minimum absolute size,
I am claiming that it must also exceed some minimum degree of richness, of breadth.
↓_Textbook-Order vs Research-Order_↓
Let us elaborate on point (1) of the model. The idea there is that
the order in which a math textbook presents a theory is almost the exact
opposite of the order in which it was actually discovered and developed.
.B0 INDENT 10 TURN OFF "α"
PRIMITIVES, AXIOMS ααααα→ DEFINITIONS ←αααααααααααααα⊃
~ ~
~ ~
↓ ~
STATEMENT OF LEMMA ~
~ ~
~ ~
↓ ~
PROOF OF LEMMA ~
~ ~
~ ~
↓ ~
~ ~
~ ~
↓ ~
STATEMENT OF NEXT LEMMA ~
⊂αααααααα⊃ ~ ~
~textbook~ | ~
~ order ~ | ~
%αααααααα$ | ~
| ~
| ~
↓ ~
PROOF OF LAST LEMMA ~
~ ~
~ ~
↓ ~
STATEMENT OF THEOREM ~
~ ~
~ ~
↓ ~
PROOF OF THEOREM ααααααααααααααα$
.E
In a textbook, one introduces some primitive elements of the theory, postulates
some axioms relating these objects and operations, then enters a
⊗4"Define → State → Prove⊗*" loop. Lemmas are stated and proved before the
theorems which require them. Motivations of any kind are a rareity, and often
are mentioned parenthetically as an application of a theorem which has just been
proved.
As an example, here is how my sophomore textbook (Introduction to
Abstract Algebra) was organized:
.B0
.INDENT 10
.TURN ON "α"
Set N, Op ⊗1Successor:N→N,⊗*
Peano⊗1'⊗*s Axioms αααααααααααα→ Define +
~
↓
State some property of +
~
↓
Prove this propterty
|
|
|
Define ⊗6≤⊗*
|
|
|
Define g.l.b
~
~
↓
State that glb always exists
~
↓
Prove this claim
|
|
|
Define TIMES
Define ⊗5Z⊗*
|
|
|
Define DIVIDES
Define PRIME
~
↓
State Euclid⊗1'⊗*s Algorithm
~
↓
Prove Euclid⊗1'⊗*s Algorithm
.E
It began by presenting Peano's axioms, completely unmotivated, then entered
the the Define → State → Prove loop. For example, the two concepts of
⊗4prime⊗* and ⊗4non-factorizable number⊗* are defined separately,
even though a hundred pages elapse before any mathematical system is presented
in which the two notions don't completely coincide.
This psychic ordering is repeated,
in microcosm, in each textbook proof. For example,
a typical epsilon/delta argument will start by magically proposing some function
delta(epsilon), which is just the right one for the approximations taken later.
In contrast,
a mathematician doing research works in almost the opposite order.
.B0 INDENT 4 TURN OFF "α"
OBSERVE x αααα→ NOTICE SOME INTERESTING REGULARITIES
~
~
↓
GATHER UP ALL RELATED INTERESTING RELATIONSHIPS
~
~ ⊂αααααααα⊃
↓ ~research~
STATE THESE FORMALLY ~ order ~
~ %αααααααα$
~
↓
CHOOSE THE MOST INTUITIVELY OBVIOUS OF THESE
~
~
↓
TRY TO DERIVE ALL THE REST FROM THIS CORE
~
~
↓
IF THE CORE IS INCONSISTENT, ELIMINATE THE LEAST BELIEVED MEMBERS
IF ANY CAN⊗6'⊗*T BE DERIVED: ENLARGE THE CORE SO THAT THEY CAN BE
IF ANY MEMBER OF THE CORE CAN BE DERIVED FROM THE OTHERS, ELIMINATE IT
~
~
↓
CALL THE CORE "AXIOMS", AND THE REST "THEOREMS"
~
~
↓
TRY TO DERIVE PREVIOUSLY UNSUSPECTED THEOREMS, FORMALLY
IF SO: TRY TO REINTERPRET IN TERMS OF x
TRY TO FIND NEW RELATIONSHIPS OCCURRING IN x
IF SO: TRY TO PROVE THEM USING THE EXISTING AXIOMS AND THEOREMS
.E
The researcher begins by observing the world: he looks at scenarios from reality
and perhaps also some earlier, already-developed, interesting mathematical theories.
He notices some interesting facts, either by raw perception of regularities
or by analogy to some interesting pattern in another theory. Of these observations,
he chooses a small set of the most intuitively clear ones, and tries to derive the
other observations formally from this chosen core. After several passes, he will
succeed; the final chosen core he calls axioms, and the rest he calls theorems.$$
Once axiomatized, new theorems may be proposed syntactically: the researcher then
may check each conjecture against the roots of the theory (both real-world and
analogous). Additional observations can later be checked against the theory,
to see if they also can be derived from the axioms. $
Let's make this more concrete, by considering how a mathematician might discover
and develop the simplest numerical concepts, the ones my textbook introduced
and developed in such a polished way.
Assume that he possesses our previously-mentioned prenumerical background, but
nothing more.
He begins by considering the two activities:
Comparing piles of marbles using a pan balance, and comparing the heights of
towers constructed from toy blocks.
.SKIP TO COLUMN 1
.B0 INDENT 7 TURN OFF "α"
/~\ /~\
/ ~ \ / ~ \
/ ~ \ / ~ \
/ ~ \ / ~ \
/ ~ \ / ~ \
/ ~ \ / ~ \
/ ~ \ oo / ~ \ oo
αααα/ ~ \αααα αααα/ ~ \αααα
~ ~
~ ~
~ ~
αααααααααααα∀αααααααααααα αααααααααααα∀αααααααααααα
/~\
/ ~ \
/ ~ \
/ ~ \ o
/ ~ \αααα
/ ~ ⊂αααααααα⊃
/ ~ ~weighing~
/ ~ ~ marbles~
/ ~ %αααααααα$
ooo / ~
αααα/ ~
~
ααααααααααααααααααααααααα
≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡
αααααα⊃
|
⊂ααααα⊃ |
~ ~ |
~ ~ | ⊂αααααα⊃
εαααααλ | ~piling~
~ ~ | αααααα⊃ ~blocks~
~ ~ | | %αααααα$
εαααααλ | ⊂ααααα⊃ |
~ ~ | ~ ~ |
~ ~ ~ ~
%ααααα$ %ααααα$
.E
.SKIP TO COLUMN 1
When he tires of playing, the mathematician makes a list of the most basic
features of these situations.
.BEGIN NOFILL SELECT 1 PREFACE 0 INDENT 0 TABS 42,82 TURN ON "\" GROUP
.TURN ON "→∞"
.ONCE CENTER PREFACE 2
⊗5FUNDAMENTAL CONCEPTS⊗*
↓_ABOUT WEIGHING MARBLES:_↓\↓_ABOUT PILING BLOCKS:_↓ →↓_COMMON NAME:_↓
There are some marbles to play with.\There are some blocks to play with.→element
The marbles are indistinguishable.\The blocks are indistinguishable.→set
Besides individual marbles, we can\Besides individual blocks, we can→variable
talk about a pile of marbles as a unit.\talk about a tower of blocks as unit.
We can copy a given pile.\We can copy a given tower.→copy
A pile of marbes and a copy of\A tower of blocks and a copy of
that pile are indistinguishable.\that tower are indistinguishable.→ignore
To make pile heavier, add a marble.\We can stack another block onto a tower.→successor
Lighten pile by removing a marble.\Can shorten tower by removing top block.→predecessor
We can tell if piles x,y balance.\Can see if towers x,y are same size.→equality
Can tell if pile x is heavier than y.\We can see if tower x is higher than y.→⊗6>⊗*
We can add one pile to another.\We can stack one tower on top of another.→add
Can remove a pile from a big one.\Can lift a tower off top of a big one.→subtract
Can remove any marble from pile.\We can only remove topmost blocks.→delete/pop
Pile might have no marbles.\There might be no blocks in a tower.→empty set
Can replace each marble in a pile\We can replace each block in a tower
by a copy of some other pile.\by a copy of some other tower.→multiply
A lone marble is also a pile.\A lone block is also a tower.→x↔{x}
etc.\etc.→etc.
⊗8∞≡→⊗*
.E
These correspond to the objects and operators of the theory he is about to develop.
He gives them each a name; I have permitted him to choose the common name for
each concept$$
For brevity, he will call the
empty set 0, and a singleton set 1. + will signify addition, etc.
S(n) is the successor of n (that is, n+1), and D(n) is the predecessor of n
(decrement n; that is, n-1).
$,
but that doesn't matter, of course. Notice that most of these
are not usually considered primitive at all.
The next list he draws up contains several specific observations, translated from
the blocks/marbles manipulation language into these new symbols.
Each relationship is translated from these symbols into the
⊗4other⊗* scenario's language, and checked. Thus all the observations here are
meaningful in both of the manipulative domains:
.COMMENT BEGIN NOFILL SELECT 6 PREFACE 0 INDENT 0 TABS 15,30,45,60,75 TURN ON "\" GROUP;
.WBOX(0,0) TABS 15,30,45,60,75 TURN ON "\" GROUP
.COMMENT ⊗8≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡⊗*;
MBOX $
MBOX 0=0\1=1\¬(0=1)\¬(1=0)\¬(S(1)=1)\¬(0=S(0)) $
MBOX $
MBOX S(0)=1\1=S(0)\S(0)=S(0)\S(1)=S(1)\¬(S(0)=S(1))\¬(S(1)=S(0)) $
MBOX $
MBOX 0+0=0\1+1=S(1)\0+1=1\1+0=1\S(1)+0=S(1)\S(0)+S(0)=S(1) $
MBOX $
MBOX 0⊗7x⊗*0=0\1⊗7x⊗*1=1\0⊗7x⊗*1=0\1⊗7x⊗*0=0\S(S(1))⊗7x⊗*0=0\S(S(1)⊗7x⊗*1=S(S(1)) $
MBOX $
MBOX 1>0\S(1)>1\S(1)>0\¬(0>1)\¬(1>1)\¬(0>0) $
.EBOX
.COMMENT ⊗8≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡⊗*;
.COMMENT E
.COMMENT if the box looks nicer than just ≡≡≡, then use it allover this section. ;
Now comes some simple generalization, either directly from the scenarios,
perceptually from the list just compiled, or syntactically from that list
(e.g., by replacing constants by variables).
Again, each of these generalizations
is intuitively checked in both real-world domains.
.BEGIN NOFILL SELECT 6 PREFACE 0 INDENT 0 TABS 15,30,45,60,75 TURN ON "\" GROUP
⊗8≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡⊗*
If n>0, then D(n) can be performed; else D(n) cannot be performed.
If n>0, then n>D(n)\S(n)>n\\S(n)=n+1\S(S(n))>n
If n=m then S(n)=S(m)\\\If S(n)=S(m) then n=m
D(S(n))=n\\If n>0 then S(D(n))=n\If n>0 then n+m>m
m+S(n)>m\\If n=m then ¬(n>m)\If n>m then ¬(n=m)
If n=m ∧ n>r then m>r\n=n\¬(n>n)\¬(n=S(n))
If n>m ∧ m>r then n>r\If n=m ∧ m=r then n=r\If n=m ∧ r=m then r=n
If n=m then m=n\If n>m then ¬(m>n)\n>m ∨ m>n ∨ n=m
n⊗7x⊗*m = m⊗7x⊗*n\\n⊗7x⊗*S(m)=n⊗7x⊗*m + n\\n+m=m+n
n+S(m)=S(n)+m\\(n+m)+(r+s) = n+((s+m)+r)
n⊗7x⊗*(m⊗7x⊗*r)=(n⊗7x⊗*r)⊗7x⊗*m\S(n)+D(m)=m+n
etc.
⊗8≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡⊗*
.E
Some of these are now considered as axiomatic, as defining the operations and
structures involved; the rest are considered secondary, as theorems following from
the axioms. This process is a slow search.
The most important game is not "finding minimal axiom sets", but rather
"what else can we prove from this set of axioms and theorems".
The AM research goal is to learn how to play this game of theory development.
To summarize idea (1):
The ⊗4motivation⊗* for a new mathematical development is either:
(a) looking at reality and at the mathematics
you have already$$ Then abstract out some interesting features;
formalize those into a
specific set of primitive structures and relations and axioms about those
basic entities $, or (b) given an existing theory, propose some new conjecture or
definition, based on analogy to the real-world roots of this theory,$$ Based
on patterns perceived in empirical
evidence $ or based on analogy to a known interesting theorem in another theory,
or occasionally based only on applying some fruitful method and observing the
result.
↓_Metaphors to other processes_↓
Let's consider some alternate ways of looking at such theory development, some
analogies to processes with which you're probably more familiar.
.BN
λλ Growing a tree -- using heuristic constraints:
Once motivated, the idea is developed and evaluated. At each moment, the
researcher should be working on the most promising of all the proposed ideas.
This process is thus a sophisticated expansion of a tree,
where new conceptual nodes are
"grown" in the most promising area, and where barren dead-ends are eventually
"pruned". To do mathematical research well, it is thus necessary and sufficent
to have good methods for proposing new concepts from existing ones, and for
deciding how interesting each candidate is. That is: effective growing and pruning
strategies.
λλ Exploring a space using an evaluation function:
Consider our core of premathematical knowledge.
By compounding the known concepts and methods,$$Using
abstraction from reality, analogy with existing theories,
the postulational method, and problem-solving techniques $
we can extend this
foundation a little wherever we wish.
⊗4<Visualize: SEVERAL SHORT LINES IN BLACK, EMANATING
FROM A CENTRAL CORE>⊗*.
The incredible variety of alternatives to investigate includes
all known mathematics, much trivia, countless deadends, and so on.
The only "successful" paths near the core
are the narrow ribbons of known mathematics
(perhaps with a few undiscovered other slivers).
⊗4<Visualize SNAKE-LIKE LINES IN RED, twisting away from the core, intersecting>⊗*.
.ONCE INDENT 4
How should we walk through this immense space, with any hope of following the
few, slender branches of already-established mathematics (or some equally successful
new fields)? We must do hill-climbing; as new concepts are formed, decide how
promising they are, always explore the currently most-promising new concept. The
evaluation function is quite nontrivial, and this research may be viewed as an
attempt to study and explain and duplicate the judgmental criteria people employ.
My attempts at codifying such "mysterious" emotive forces as intuition, aesthetics,
utility, richness, interestingness, relevance... indicate that a large but not
unmanagable collection of heuristic rules should suffice.
.ONCE INDENT 4
The important visualization to make
is that with proper evaluation criteria, we convert the
flat picture to
a breath-taking relief map:
the known lines of development become
mountain ranges, soaring above the vast flat plains
of trivia and inconsistency
below.
Occasionally an isolated
hill is discovered near the core;$$ E.g.,
Knuth's ↓_Surreal Numbers_↓ $ certainly
whole ranges lie undiscovered for long periods of time:$$ E.g.,
non-Euclidean geometries $ the terrrain far from the initial core is
not ⊗4at all⊗* explored yet.
.ONCE INDENT 4
Intuition is like vision, letting the explorer observe a distant mountain,
long before he has conquered its intricate, specific challenges.
.ONCE INDENT 4
If the criteria for evaluating interestingness and promise are good enough, then
it should be straightforward to simply push off in any direction, locate some
nearby peak, and follow the mountain range along (duplicating the development in
some field). In fact, by intentionally pushing off in apparently barren directions,
new ranges might be enountered. If the criteria are "correct", then ⊗4any⊗* new
discovery the system makes and likes will necessarily be interesting to humans.
.ONCE INDENT 4
If, as is more likely, the criteria are deficient, this too will tell us much. Before
beginning, we shall strive to include all the obvious factors which enter into
judgmental decisions, with appropriate weights, etc. If the criteria fail, then
we can analyze that failure and learn about one nonobvious factor in evaluating
success in mathematics (and in any creative endeavor). After modifying the criteria
to include this new factor, we can proceed again.
λλ Syntax and Semantics in Natural Language Processing:
Consider assumptions, axioms, definitions, and
theorems to be
syntactic rules for the
language that we call Mathematics.
Thus theorem-proving, and the whole of textbook mathematics, is a purely syntactic
process.
Then these judgmental
criteria I am alluding to would correspond to the semantic knowledge associated
with these more formal methods.
Just as one can upgrade natural-language-understanding by encorporating semantic
knowledge, so AM ensures that it knows the intuitive and empirical roots of
each concept in its repertoire.
.E
. SSSEC(AM as Heuristic Search)
.ONCE TURN ON "{}"
As the title of this section -- and this thesis -- proclaims, AM is a
kind of "heuristic search" program. That must mean that AM is
exploring a particular "space," using some informal evaluation
criteria to guide it. Sections {SECNUM}.{SSECNUM}.2 and
{SECNUM}.{SSECNUM}.3 will present this view in detail. First, let me
define what a heuristic search is. Readers familiar with this concept
may skip the first subsection entirely.
. SSSEC(Heuristic Search)
Heuristic gourmets sometimes find it useful to classify heuristic
searches based on their taste, but for our modest purposes we'll
stick to just one flavor$$ Rocky road. $.
The activity we wish to model using the search must be phrased in
terms of states and operators. The operators are capable of
transforming some states into different ones. Given some initial
configuration of states, the "game" is to repeatedly apply opertors
to states. The objective will either be a particular state, or any
state satisfying some well-defined criteria. Finally, some rules of
thumb must be present, to guide the player as he chooses which
operator to apply to which state next.
This is a rather confused presentation; a superior one is found in
Chapters... of [Nilsson]. Perhaps a more graphical interpretation
will be helpful. A graph is a bunch of little circles, called
⊗4nodes⊗*, which represent the states of the search, plus a bunch of
arrows, called ⊗4arcs⊗*, which represent applications of the
operators.
Each node has a name, which is written inside its little circle, and
each arc may have a label, which is written alongside it.
If an arc points from node X to Y and is labelled F, then
we assume that opeator F was applied to state X, and that the
resultant state was Y.
.B0
⊂∂∂∂∂∂∂∂⊃
~ ~
~ X ~
~ ~ ~
~ ~ ~
~ ~ F ~
~ ~ ~
~ ↓ ~
~ Y ~
~ ~
%∂∂∂∂∂∂∂$
.E
The whole idea of "search" in this
interpretation is to carefully enlarge the graph (to keep drawing new
arcs and nodes).
Each node can have several arcs pointing to it; each arc can point to
several different nodes; and each arc can point ⊗4from⊗* a collection
of different nodes. Thus an arc is like an arrow which can have
multiple heads and tails:
< Diagram of a graph >
.B0
A Z W
~\ / /
~ \ B←∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂C / Q
~ \ ↓ ↑ / /
~ R∂∂→D∂∂∂∂∂∂→E∂∂∂∂∂∂∂→H←∂∂∂∂∂∂∂∂∂∂∂L∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂
~ / \
~ / \
↓/ G
U
.E
Some more terminology may come in handy here. If an arc points from
X to Y, we say that Y is a ⊗4successor⊗* of X, and that X is a
⊗4parent⊗* of Y. If a chain of arrows leads from X to Y... to Z, we
say that Z is a ⊗4descendant⊗* of X, and that X is an ⊗4ancestor⊗* of
Z. If there is one function which can generate all successors of any
given node, we call that the ⊗4successor function⊗* of the search.
Any function which is used to informally rate the "value" of any
given node is called a ⊗4heuristic evaluation function⊗*.
Let's give a few examples.
<<Rewrite this chess example. >
In a chess-playing program, a natural choice is to make each possible
board position correspond to a state. The operators are simply the
legal moves of the game. So there would be an operator called
"Move-rook", which could transform a given state into a large
collection of new ones, which differed from the original one only in
that one rook had been (legally) moved along some rank or file.
Sometimes, this operator would yield ⊗4no⊗* successors (e.g., if one
rook is gone and the other is pinned against the king). The
heuristic evaluationfunction would be a function which looked at any
board situation, and counted material, examined control of key areas
of the board, and in other ways somehow pieced together an opinion
about just how good this board really is (from the standpoint of one
particular player). Each player might have his own evaluation
function, or they might both use the same one, or they might vary
which evaluation function they use depending on recent moves, their
opponent's reputation, random chance, etc. As with many
non-chess-playing humans, there is no sense of a definite goal,
except just prior to mating. Rather, the typical approach is to
generate some of the successors to the current board situation (using
heuristic guidance to only generate plausible moves), and then
evaluate each successor state using the evaluation function. The best
of these states would then be expanded in turn (i.e., all of the
opponent's relies would be considered). If the opponent had ⊗4any⊗*
good reply to one of the moves, that move would then have its rating
lowered accordingly. This process -- which is called ⊗4minimaxing⊗*
-- would be continued as long as time permitted. When forced to halt,
the successor of the current state which had the highest value would
be chosen as the actual successor (i.e., the corresponding move to
alter the board to that state would then be made), and it would
become you oppenent's turn.
<< Give a couple more examples of heuristic search. >
In theorem-provers, each operator is a rule of inference. Applying
an operator adds one more wff to the list of known true statements.
So in some sense ⊗4all⊗* paths lead to a "goal", the problem is to
find a short, direct path. A solution is "good" if it consists of a
very short path leading to an acceptable goal node. The quality of
the path is very important.
. SSSEC(AM's Similarities to Heuristic Search)
AM explores mathematics by selectively enlarging itself: AM ⊗4is⊗* a
body of mathematical knowledge (concepts, plus the wisdom to use them
effectively). AM is a heuristic search program, much like the chess
and theorem-proving programs just desribed. To see this, we must
explain what the nodes of AM's search space are, what the operators
or links are, where the heuristic information comes into play, and
what the evaluation function is.
AM's space can be consisered to consist of a bunch of nodes which are
each a facet/concept pair (a primitive task on AM's agenda of things
to do). In one sense, the successor function is "EVAL" -- i.e.,
given a task, execute it. In a more meaningful sense, the operators
are the heuristic rules that AM uses to suggest new tasks and create
new concepts. While a task is executed (a node is being expanded),
several new tasks may get suggested (other nodes will be pointed to:
the successor nodes of the current one). Each arc in the space will
thus be a reason for executing the node it points to. When a task is
suggested (by some heuristic), it is tagged with a list of reasons
explaining why it might be worthwhile. The evaluation function need
only scan all the nodes (all the facet/concept pairs), and examine
each's list of reasons. In fact, experiments show that the precise
form of the evaluation function is unimportant$$ See experiment 3,
page..., Section... $. AM has a definite algorithm for rating ithe
nodes of its space, and yet certainly has no specific goal criteria:
it can't quit just because a dynamite task has been proposed. AM goes
on forever$$ Technically, forever is about 100,000 list cells $.
This is probably unclear, so perhaps a picture will help:
<<Draw a graph: AM as heur search. >
Notice that... <<Things to notice about that graph. >
. SSSEC(AM's Differences from Heuristic Search)
There are several difficulties and anomalies in forcing AM into the
heuristic search paradigm. For example, the main entities that AM is
concerned with are concepts, and yet the nodes of its search space
are merely pointers to particular empty facets. In a typical
heuristic search, the nodes are intimately ties up with the intuitive
"states" of the situation.
Another anomaly is that the operators which AM uses to enlarge and
explore the space of concepts are themselves mathematical concepts
(e.g., some heuristic rules result in the creation of new ones). Thus
AM should be viewed as a mass of knowledge which enlarges ⊗4itself⊗*
repeatedly. As far as I know, all previous systems kept the
information they "discovered" quite separate fromthe knowledge they
use to make discoveries$$ Of course this is typically because the two
kinds of knowledge are very different: For a chess-player, first kind
is "good board positions," and the second is "strategies for making a
good move." $.
Let me reemphasize at this time that AM has no well-defined target
concepts or target relationships. Rather, its "goal criteria" -- its
sole aim -- is to preserve the interestingness level of the
activities it performs, of the top tasks on the agenda. We don't
care what AM does -- or misses -- so long as it spends its time on
plausible tasks. There is no fixed set of theorems that AM should
discover (AM is thus not like a typical ⊗4problem-solver⊗*), no fixed
set of traps AM should avoid (AM is thus very different from
⊗4game-players⊗* like chess programs).
For example, there is no stigma attached to the fact that AM never
discovered real numbers$$ There are many "nice" things which AM
didn't -- and can't -- do: e.g., devising ↓_geometric_↓ concepts from
its initial simple set-theoretic knowledge. Paul Cohen has indicated
that this would be a supremly exciting development: the invention of
geometry -- and all the tools it provides -- by a system which has
neither geometric intuition built into it, nor definitions of
geometric concepts. I do not think this possible; see the section on
the limitations of AM, page...$; it was rather surprising that AM
managed to discover ⊗4natural⊗* numbers! Even if it hadn't done
that, it would have been just as acceptable$$ Acceptable to whom? Is
there really a domain-invariant criterion for judging the quality of
AM's actions? See the discussion in Section..., on Page... $ if AM
had simply gone off and developed ideas in set theory.
. SSSEC(Summary: Agendas and Heuristic Search)
<< Why the agenda mechanism is well-suited to heur search algorithms
>
Consider Nilsson's description of depth-first searching, and of
breadth-first searching. He has us maintain a list of "open" nodes.
Repeatedly, we pluck the top one and expand it. In the process, some
new nodes may be added to the Open list. In the case of depth-first
searching, they are added at the top; for breadth-first search, they
must be added at the bottom. For heuristic search, or "best-first"
search, they are evaluated in some numeric way, and then "merged"
into the already-sorted list of Open nodes.
This process is clearly analogous to the agenda mechanism AM uses.
It is also the same as the one used in KRL [reference], and used
earlier in Dendral$$ The "Predictor" part of DENDRAL. See [MI4 ref.].
$ The main difference is that in AM, symbolic reasons are used
(albeit in trivial token-like ways) to decide whether -- and how much
-- to boost the priority of a task when it is suggested again.
Agendas are the canonical kind of data structure for a "best-first"
process.